{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1110103c-70d0-4ac0-8208-6a678b88deae",
   "metadata": {},
   "source": [
    "# Optimize PyTorch Models using Intel® Extension for PyTorch* (IPEX)\n",
    "\n",
    "This notebook guides you through the process of extending your PyTorch* code with Intel® Extension for PyTorch* (IPEX) with optimizations to achieve performance boosts on Intel® hardware.\n",
    "\n",
    "\n",
    "| Area                  | Description\n",
    "|:---                   |:---\n",
    "| What you will learn   | Applying Intel® Extension for PyTorch* (IPEX) Optimizations to a PyTorch workload in a step-by-step manner to gain performance boost\n",
    "| Time to complete      | 30 minutes\n",
    "| Category              | Code Optimization\n",
    "\n",
    "## Purpose\n",
    "\n",
    "This sample notebook shows how to get started with Intel® Extension for PyTorch* (IPEX) for sample Computer Vision and NLP workloads.\n",
    "\n",
    "The sample starts by loading two models from the PyTorch hub: **Faster-RCNN** (Faster R-CNN) and **distilbert** (DistilBERT). After loading the models, the sample applies sequential optimizations from Intel® Extension for PyTorch* (IPEX) and examines performance gains for each incremental change.\n",
    "You can make code changes quickly on top of existing PyTorch code to obtain the performance speedups for model inference.\n",
    "\n",
    "We will be generating synthetic data to be used for inference with sample computer vision and NLP workloads. We will first use stock PyTorch models to generate predictions. Then, with minimal code changes using Intel® Extension for PyTorch* (IPEX), we will see how speedups can be gained over stock PyTorch on Intel® hardware. We will also see how quantization features from Intel® Extension for PyTorch* (IPEX) can be used to reduce the inference time of a model.\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "\n",
    "| Optimized for          | Description\n",
    "|:---                    |:---\n",
    "| OS                     | Ubuntu* 20.04 or newer\n",
    "| Hardware               | Intel® Xeon® Scalable processor family\n",
    "| Software               | Intel® Extension for PyTorch*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "431d988d-40f1-4f98-96fd-2e17b4126eb4",
   "metadata": {},
   "source": [
    "# Key Takeaways"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7438fa45-81e6-4d42-847b-fbe895ae8eed",
   "metadata": {},
   "source": [
    "- Get started with Intel® Extension for PyTorch* (IPEX) for drop-in acceleration\n",
    "- Learn how to use the *optimize* method from Intel® Extension for PyTorch* (IPEX) to apply optimizations at Python frontend to the given model (nn.Module)\n",
    "- Learn how to use Quantization features from Intel® Extension for PyTorch* (IPEX) to convert model to INT8\n",
    "- Learn how to use Intel® Extension for PyTorch* (IPEX) Launch Script module to set additional configurations on top of the previously mentioned optimizations to boost performance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d72174e0",
   "metadata": {},
   "source": [
    "# Samples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06a26381",
   "metadata": {},
   "source": [
    "## Install Intel® Extension for PyTorch* for CPU and dependency packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ccfb9a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m pip install transformers matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7eb6281-5db9-4a4f-9c6a-3f9c132f30f9",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Computer Vision Workload - Faster R-CNN, Resnet50 Backbone"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d911257c-c9b0-4365-a308-95a4b3aea487",
   "metadata": {},
   "source": [
    "Faster R-CNN is a convolutional neural network used for object detection. We are going to use the **optimize** method from Intel® Extension for PyTorch* (IPEX) to apply optimizations. Following this, we will also use TorchScript to obtain performance gains."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96966a7b-b036-4f1c-8dd8-3b90b76f98c9",
   "metadata": {},
   "source": [
    "Let's start by importing all the necessary packages and modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a4cb03a-f6b4-465b-9363-b435b00336c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import torch\n",
    "import torchvision\n",
    "import os\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f613973-ffb0-482a-bc56-c295bce3c088",
   "metadata": {},
   "source": [
    "**Prepare Sample Data**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22eeae78-534e-4d44-a0f3-c48a4213ac3f",
   "metadata": {},
   "source": [
    "Let's generate a random image using torch to test performance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e75c62d-dbbb-47da-aa07-717e9719d86a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the device to cpu\n",
    "device = 'cpu'\n",
    "# generate a random image to observe speedup on\n",
    "image = torch.randn(1, 3, 1200, 1200)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "678f5795-dcd8-439d-8d98-f4197a2417e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# explore image shape\n",
    "\n",
    "print(image.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32a54de2-76e7-42f9-9506-5574f5bb95a4",
   "metadata": {},
   "source": [
    "**Helper Functions**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43d0d63d-1d2b-4edc-8d36-0e2dd4ff3780",
   "metadata": {},
   "source": [
    "Some functions to help us with loading the model and summarizing the optimizations. The functions below will help us record the time taken to run and, plot comparison charts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c265eb1f-ef85-4ed1-941d-158d2ec16af6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_model_eval_mode():\n",
    "    \"\"\"\n",
    "    Loads model and returns it in eval mode\n",
    "    \"\"\"\n",
    "    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=weights, progress=True,\n",
    "        num_classes=91, weights_backbone=weights_backbone).to(device)\n",
    "    model = model.eval()\n",
    "    \n",
    "    return model\n",
    "\n",
    "def get_average_inference_time(model, image):\n",
    "    \"\"\"\n",
    "    does a model warm up and times the model runtime\n",
    "    \"\"\"\n",
    "    with torch.no_grad():\n",
    "        # warm up\n",
    "        for _ in range(25):\n",
    "            model(image)\n",
    "\n",
    "        # measure\n",
    "        import time\n",
    "        start = time.time()\n",
    "        for _ in range(25):\n",
    "            output = model(image)\n",
    "        end = time.time()\n",
    "        average_inference_time = (end-start)/25*1000\n",
    "    \n",
    "    return average_inference_time\n",
    "\n",
    "def plot_speedup(inference_time_stock, inference_time_optimized):\n",
    "    \"\"\"\n",
    "    Plots a bar chart comparing the time taken by stock PyTorch model and the time taken by\n",
    "    the model optimized by Intel® Extension for PyTorch* (IPEX)\n",
    "    \"\"\"\n",
    "    data = {'stock_pytorch_time': inference_time_stock, 'optimized_time': inference_time_optimized}\n",
    "    model_type = list(data.keys())\n",
    "    times = list(data.values())\n",
    "\n",
    "    fig = plt.figure(figsize = (10, 5))\n",
    "\n",
    "    # creating the bar plot\n",
    "    plt.bar(model_type, times, color ='blue',\n",
    "            width = 0.4)\n",
    "\n",
    "    plt.ylabel(\"Runtime (ms)\")\n",
    "    plt.title(f\"Speedup acheived - {inference_time_stock/inference_time_optimized:.2f}x\")\n",
    "    plt.show()\n",
    "    \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46eb1386-203a-4667-aa9c-9b26adcd02c4",
   "metadata": {},
   "source": [
    "**Baseline PyTorch Model**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c441b52d-597b-422c-bd01-524a687f8024",
   "metadata": {},
   "source": [
    "A baseline model is the simplest version of the model that can be loaded from the PyTorch hub. Let's load the baseline for Faster R-CNN model and get predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d4efa17-9abe-4595-bcc1-8b9c3b966211",
   "metadata": {},
   "outputs": [],
   "source": [
    "# model configs\n",
    "weights = torchvision.models.detection.FasterRCNN_ResNet50_FPN_Weights.DEFAULT\n",
    "weights_backbone = torchvision.models.ResNet50_Weights.DEFAULT"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "858fb895-8617-4997-a8ec-8974d00e4606",
   "metadata": {},
   "source": [
    "**Input Image Memory Format**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43a1fa1e-2e89-4c43-b1e4-7682686c2a4a",
   "metadata": {},
   "source": [
    "There are two ways to represent image data that are inputs to a CNN model. Channels-First, and Channels-Last. In Channels-First, the channels dimension comes first followed by height and width. For example - (3, 224, 224) or NCHW where N is batch size, C is channels, H is height, and W is width. In Channels-Last, the channels dimension comes last. For example - (224, 223, 3) or NHWC."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a12280f4-b8a1-4c9a-89d3-2f092396f431",
   "metadata": {},
   "source": [
    "**Channels-First**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59f09f4f-de2c-4006-83e0-d201b58097e6",
   "metadata": {},
   "source": [
    "PyTorch uses channels-first by default"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f054f770-c8d6-4fe7-baed-7ed77c510a13",
   "metadata": {},
   "outputs": [],
   "source": [
    "# send the input to the device and pass it through the network to\n",
    "# get the detections and predictions\n",
    "\n",
    "model = load_model_eval_mode()\n",
    "\n",
    "inference_time_stock = get_average_inference_time(model, image)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_stock} ms\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3285b34b-03a2-4654-8061-a4dd9f66d1e9",
   "metadata": {},
   "source": [
    "**Channels-Last**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31439e30-344e-455c-816a-a88eb868ede4",
   "metadata": {},
   "source": [
    "Channels-Last memory format is a different way of ordering NCHW tensors allowing us to make Channels-Last memory format optimizations on Intel® hardware"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0afc270-0d24-4e8d-a1a9-271dd66fc350",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_model_eval_mode()\n",
    "model = model.to(memory_format=torch.channels_last)\n",
    "image_channels_last = image.to(memory_format=torch.channels_last)\n",
    "\n",
    "inference_time_stock = get_average_inference_time(model, image_channels_last)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_stock} ms\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cfff9fc-0e9d-4703-8369-da1db780d15e",
   "metadata": {},
   "source": [
    "Now that we have timed the stock PyTorch model, let's add minimal code changes from Intel® Extension for PyTorch* (IPEX) to obtain speedups. The minimal code changes are highlighted in the following cell"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aef650d5-02db-40a6-9688-90a03fee7da2",
   "metadata": {},
   "source": [
    "**Intel® Extension for PyTorch* (IPEX)**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05237e85-1039-4fb5-8651-bb815824a1d9",
   "metadata": {},
   "source": [
    "As described above, Intel® Extension for PyTorch* (IPEX) provides us with the ability to make minimal code changes to apply optimizations over stock PyTorch models using Intel® hardware. The simple code changes are indicated below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6356714f-c2b3-46c7-8dab-103d40054eb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_model_eval_mode()\n",
    "model = model.to(memory_format=torch.channels_last)\n",
    "image_channels_last = image.to(memory_format=torch.channels_last)\n",
    "#################### code changes ####################\n",
    "import intel_extension_for_pytorch as ipex\n",
    "model = ipex.optimize(model)\n",
    "######################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6bdd609b-13ec-4bb7-93ec-b2f8a6dd0f68",
   "metadata": {},
   "outputs": [],
   "source": [
    "inference_time_optimized = get_average_inference_time(model, image_channels_last)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_optimized} ms\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5843fe0f-52c3-49a5-b231-f4ca514395a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot performance gain bar chart\n",
    "\n",
    "plot_speedup(inference_time_stock, inference_time_optimized)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d263abf8-e0b6-4d99-a14a-67e46c197c3d",
   "metadata": {},
   "source": [
    "> **_NOTE:_**  If a below par performance is observed, please restart the notebook kernel."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d465fcc-8115-46de-9542-2c4c8d7c1771",
   "metadata": {},
   "source": [
    "**TorchScript**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e96cbe96-92c2-4429-b90b-4e78a4bac04d",
   "metadata": {},
   "source": [
    "TorchScript is a way to create serializable and optimizable models from PyTorch code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87254f28-ce41-4066-81c1-bcd646a0b871",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_model_eval_mode()\n",
    "model = model.to(memory_format=torch.channels_last)\n",
    "with torch.no_grad():\n",
    "    model.backbone = torch.jit.trace(model.backbone, image_channels_last, strict=False)\n",
    "    model.backbone = torch.jit.freeze(model.backbone)\n",
    "    inference_time_optimized = get_average_inference_time(model, image_channels_last)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_optimized} ms\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ad01762-ded5-4380-b649-7a3db30c3b34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot performance gain bar chart\n",
    "\n",
    "plot_speedup(inference_time_stock, inference_time_optimized)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11b2888e-1171-400d-a5f9-783e6c52f01e",
   "metadata": {},
   "source": [
    "## NLP Workload - DistilBERT Base Uncased"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62666465-5647-4ccb-a53c-d389d3261629",
   "metadata": {},
   "source": [
    "DistilBERT is a transformer model, smaller and faster than BERT. We will use the Quantization feature from Intel® Extension for PyTorch* (IPEX) to convert the model into INT8 for faster inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e10eba7b-3d6e-46e8-8baf-8eddcc5f9d2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import DistilBertTokenizer, DistilBertModel, logging\n",
    "logging.set_verbosity_error()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b7899a8-205d-4ebc-b304-e9ee65ad3643",
   "metadata": {},
   "source": [
    "**Helper Functions**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "797de740-3a76-48cc-838c-ef8e981cd41b",
   "metadata": {},
   "source": [
    "Similar functions as before to help us load the model and summarize the optimizations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0b68ba1-9215-4408-b3ee-5d6d610cffd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_model_eval_mode():\n",
    "    \"\"\"\n",
    "    Loads model and returns it in eval mode\n",
    "    \"\"\"\n",
    "    model = DistilBertModel.from_pretrained('distilbert-base-uncased-distilled-squad')\n",
    "    model.eval()\n",
    "    \n",
    "    return model\n",
    "\n",
    "def get_average_inference_time(model, inputs):\n",
    "    \"\"\"\n",
    "    does a model warm up and times the model runtime\n",
    "    \"\"\"\n",
    "    with torch.no_grad():\n",
    "        # warm up\n",
    "        for _ in range(25):\n",
    "            model(**inputs)\n",
    "\n",
    "        # measure\n",
    "        import time\n",
    "        start = time.time()\n",
    "        for _ in range(25):\n",
    "            outputs = model(**inputs)\n",
    "        end = time.time()\n",
    "        average_inference_time = (end-start)/25*1000\n",
    "    \n",
    "    return average_inference_time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81ec7560-5a3d-4450-badd-7ff3e5d6ae9b",
   "metadata": {},
   "source": [
    "Generate sample text and tokenize using the transformers tokenizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d94e1ea-03f8-4f2b-9c05-ae4a97d7d7d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# tokenizer for distilbert\n",
    "tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')\n",
    "\n",
    "# sample data\n",
    "question, text = \"Who was Jim Henson?\", \"Jim Henson was a nice puppet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e146748d-7b31-4e62-82fb-92e3cd231f0e",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_model_eval_mode()\n",
    "\n",
    "inputs = tokenizer(question, text, return_tensors=\"pt\")\n",
    "\n",
    "inference_time_stock = get_average_inference_time(model, inputs)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_stock} ms\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd79e93e-bc11-48b2-8a92-b74f29e4d2bf",
   "metadata": {},
   "source": [
    "**Quantization**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cfbb3fe-d99f-4e74-abd0-04bdbbe6e632",
   "metadata": {},
   "source": [
    "Quantization allows us to perform operations and store tensors at a lower precision than FP32, like INT8 for example. This compact model and data representation results in a lower memory requirement."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c8ed525-7075-4a88-80d2-dfb739468994",
   "metadata": {},
   "source": [
    "Let's import the quantization modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "342873f5-72a3-4105-a2a5-68778d0599c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from intel_extension_for_pytorch.quantization import prepare, convert\n",
    "import intel_extension_for_pytorch as ipex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b71e64d0-21f7-497f-86c3-d1afcaeefd6c",
   "metadata": {},
   "source": [
    "**Static Quantization**  \n",
    " Static quantization quantizes the weights and activations of the model. It fuses activations into preceding layers where possible. It requires calibration with a representative dataset to determine optimal quantization parameters for activations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ef972ab-a272-45dd-bd55-fbd9f7db5017",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_model_eval_mode()\n",
    "\n",
    "inputs = tokenizer(question, text, return_tensors=\"pt\")\n",
    "\n",
    "jit_inputs  = tuple((inputs['input_ids'], inputs['attention_mask']))\n",
    "\n",
    "qconfig_mapping = ipex.quantization.default_static_qconfig_mapping # for static quantization\n",
    "prepared_model = ipex.quantization.prepare(model, qconfig_mapping, example_inputs=jit_inputs, inplace=False)\n",
    "\n",
    "for i in range(2):\n",
    "    calibration_output = prepared_model(**inputs)\n",
    "\n",
    "model = convert(prepared_model)\n",
    "with torch.no_grad():\n",
    "    model = torch.jit.trace(model, jit_inputs, strict=False)\n",
    "    model = torch.jit.freeze(model)\n",
    "    y = model(**inputs)\n",
    "    y = model(**inputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7983dfa3-8bb2-4f9e-994c-24cbd556b94d",
   "metadata": {},
   "outputs": [],
   "source": [
    "inference_time_optimized = get_average_inference_time(model, inputs)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_optimized} ms\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54e19218-dd71-4711-a25a-8e7c8a787d19",
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot performance gain bar chart\n",
    "\n",
    "plot_speedup(inference_time_stock, inference_time_optimized)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e79ec28-6063-4d21-95d5-32be78bd49af",
   "metadata": {},
   "source": [
    "**Dynamic Quantization**  \n",
    " In dynamic quantization the weights are quantized ahead of time but the activations are dynamically quantized during inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bef3494-b5c5-4acd-8431-71e3199a9764",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = load_model_eval_mode()\n",
    "\n",
    "inputs = tokenizer(question, text, return_tensors=\"pt\")\n",
    "\n",
    "jit_inputs  = tuple((inputs['input_ids'], inputs['attention_mask']))\n",
    "\n",
    "\n",
    "qconfig_mapping = ipex.quantization.default_dynamic_qconfig_mapping # for dynamic quantization\n",
    "prepared_model = ipex.quantization.prepare(model, qconfig_mapping, example_inputs=jit_inputs, inplace=False)\n",
    "model = convert(prepared_model)\n",
    "with torch.no_grad():\n",
    "    model = torch.jit.trace(model, jit_inputs, strict=False)\n",
    "    model = torch.jit.freeze(model)\n",
    "    y = model(**inputs)\n",
    "    y = model(**inputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3cc311f2-6215-41ff-a7ed-e1942e0cbbde",
   "metadata": {},
   "outputs": [],
   "source": [
    "inference_time_optimized = get_average_inference_time(model, inputs)\n",
    "\n",
    "print(f\"time taken for forward pass: {inference_time_optimized} ms\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb75733c-91d9-4d38-97a8-a5bb4b3063a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot performance gain bar chart\n",
    "\n",
    "plot_speedup(inference_time_stock, inference_time_optimized)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bff70b14-26d9-4ef8-894a-6d0a40973324",
   "metadata": {},
   "source": [
    "## Intel® Extension for PyTorch* (IPEX) Launch Script"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63f71568-902f-4e9f-a130-cc5fae1961db",
   "metadata": {},
   "source": [
    "Default primitives of PyTorch and Intel® Extension for PyTorch* (IPEX) are highly optimized, there are things users can do improve performance. Setting configuration options properly contributes to a performance boost. However, there is no unified configuration that is optimal to all topologies. Users need to try different combinations by themselves."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f826ac48-17e2-48b4-ab2c-0a47620790a7",
   "metadata": {},
   "source": [
    "**Single instance for inference**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f300e009-f7cb-4403-8229-938bf89b0920",
   "metadata": {},
   "source": [
    "The launch script is provided as a module of Intel® Extension for PyTorch* (IPEX). Below are some of those configurations that can be set using the launch script for a single instance. The launch script can be run as a shell command from a Jupyter notebook or from the shell itself."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35ed3f44-c019-4997-a73d-a043ddfa12ee",
   "metadata": {},
   "source": [
    "To explore the features of the launch script module, we will be using a ResNet-50 model, which is a a convolutional neural network that is 50 layers deep.The model script is present in the scripts folder"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edbd3506-dc86-4f18-b666-dfa9e9e3705a",
   "metadata": {},
   "source": [
    "It is recommended that the user check the output of [htop](https://htop.dev/) in an accompanying terminal to check the usage of cores while running the cells below. The output from htop looks as shown below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5015d8d-6ac6-41c4-a10e-509069b699ee",
   "metadata": {},
   "source": [
    "![htop](https://intel.github.io/intel-extension-for-pytorch/latest/_images/1ins_phy.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91737a0e-1a14-4582-a9c4-63b24e963ff7",
   "metadata": {},
   "source": [
    "By running the below command, One main worker thread will be launched, then it will launch threads on 2 other physical cores."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76265fe7-52cd-4574-9657-18efa5c25514",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m intel_extension_for_pytorch.cpu.launch --ninstances 1 --ncores-per-instance 3 --log_path ./logs ./python/resnet50.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fa2dd27-cf4f-4e1d-808a-271c35cd508b",
   "metadata": {},
   "source": [
    "Similarly by increasing the number of cores, we can see an improvement in the inference time as shown below "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c522568-db0f-4438-ae1f-66623e07c96f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m intel_extension_for_pytorch.cpu.launch --ninstances 1 --ncores-per-instance 6 --log_path ./logs ./python/resnet50.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3a00650-d15f-4aa7-b526-a21c6823d50a",
   "metadata": {},
   "source": [
    "We saw a small example usage of the launch script module. This [documentation](https://intel.github.io/intel-extension-for-pytorch/cpu/1.12.100+cpu/tutorials/performance_tuning/launch_script.html) provides many more examples to use the launch script. As mentioned earlier, each deep learning topology can benefit from custom tuning to achieve the best performance on top of the optimizations we have discussed so far."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd6e13ed-0c0e-4e97-acaa-7a8dfba259c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"[CODE_SAMPLE_COMPLETED_SUCCESFULLY]\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  },
  "nbTranslate": {
   "displayLangs": [
    "*"
   ],
   "hotkey": "alt-t",
   "langInMainMenu": true,
   "sourceLang": "en",
   "targetLang": "fr",
   "useGoogleTranslate": true
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
